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ABSTRACT 


A  new  algorithm  for  estimating  motion  from  image  sequences 
is  presented.  Initial  motion  estimates  are  determined  based 
on  a  least-squares  solution  to  a  set  of  independent  linear  con¬ 
straints  on  the  motion  at  a  pixel.  These  initial  estimates  are 
then  improved  by  a  nonlinear  smoothing  operation.  The  results 
of  this  algorithm  are  compared  with  those  obtained  by  the  Horn- 
Schunck  algorithm  JdrOT  on  a  number  of  image  sequences. 
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1.  Introduction 

One  of  the  key  problems  in  analysis  of  time-varying  images 
is  the  computation  of  object  motion  from  frame  to  frame.  Huang 
[1]  identifies  three  approaches  to  motion  estimation  from  image 
sequences : 

1)  Fourier  methods 

2 )  Correspondence  methods 

3)  Differential  methods. 

Fourier  methods  are  based  on  the  observation  that  a  2-D 
translation  and/or  rotation  of  an  image  has  a  simple  effect  on 
the  Fourier  transform  of  the  image.  For  example,  a  translation 
results  in  a  phase  shift  of  the  transform,  so  that  differences 
in  the  appropriate  phase  angles  from  frame  to  frame  determine  a 
translation.  For  this  approach  to  work,  the  object  must  be  trans¬ 
lating  across  a  homogeneous  ("flat" )  background.  However,  exten¬ 
sions  to  rotation  (and  scale)  involve  more  complicated  compari¬ 
son  of  transforms ,  which  makes  the  approach  computationally  unat¬ 
tractive. 

Correspondence  methods  are  two  stage  methods  and  involve 

1)  finding  image  points  in  successive  frames  which  corres¬ 
pond  to  the  projection  of  the  same  scene  point,  and 

2)  using  those  correspondences  to  solve  for  the  motion. 

Step  2  is  straightforward  in  principle,  although  it  may  involve 
non-linear  parameter  estimation  (see  Ullman  [2] ,  Huang  [1]  and 
Huang  and  Tsai  [3]).  If  the  motion  changes  slowly  with  time, 
then  this  is  not  a  severe  practical  problem. 
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For  step  1,  we  must  determine  reliably  identifiable  image 
features  which  are  invariant  to  motion.  For  example,  if  image 
boundaries  correspond  to  locally  planar  reflectance  contours  on 
3-D  objects,  then  curvature  discontinuities  and  curvature  zero- 
crossings  are  features  invariant  to  motion,  and  can  be  used  to 
compute  the  correspondences  in  step  1.  For  a  survey  of  match¬ 
ing  algorithms  for  solving  the  correspondence  problem,  see  [4]. 

Differential  methods  are  based  on  the  relationship  between 
motion  and  spatial/temporal  image  intensity  derivatives,  name¬ 
ly  that 

-h  •  V  +  V  (1) 

where  I .  is  the  temporal  intensity  derivative,  I  and  I  are 
t  x  y 

the  spatial  intensity  derivatives  in  the  x  and  y  directions, 
and  u  and  v  are  the  components  of  the  motion  in  the  x  and  y  direc 
tions.  Equation  (1)  can  be  derived  using  a  Taylor  series  expan¬ 
sion  for  the  image  function  and  assuming  a)  that  gray  level  is 
invariant  to  motion  and  b)  that  the  image  intensity  function 
is  locally  planar,  i.e.,  higher  order  spatial  intensity  deriva¬ 
tives  are  zero.  Equation  (1)  represents  a  linear  constraint 
on  u  and  v,  so  that  while  the  component  of  motion  in  the  gradi¬ 
ent  direction  can  be  determined  (the  "normal  component"),  the 
component  in  the  level  direction  cannot.  However,  by  combining 
the  motion  constraints  from  a  number  of  points,  one  can  compute 
an  estimate  of  the  image  motion. 


These  constraints  are  combined  based  on  assumptions  about 
the  image  vectors.  For  example,  in  [5,6]  the  assumption  is 
made  that  the  image  motion  vectors  are  locally  constant  (cor¬ 
responding  to  an  image  plane  translation)  so  that  the  least- 
square  pseudo- intersection  of  the  constraint  lines  in  (u,v) 
space  represents  the  motion  vector  at  each  point.  Since  such 
approaches  require  combining  the  normal  velocity  components  of 
neighboring  points,  they  will  not  yield  reliable  estimates  near 
the  boundaries  of  moving  objects.  In  Section  2  of  this  paper  we 
show  how  multiple  linear  constraints  on  image  motion  can  be 
derived  at  individual  points  so  that  a  motion  estimate  can  be 
computed  for  a  point  without  making  any  assumptions  about  the 
spatial  pattern  of  motion  vectors. 

Unfortunately,  the  motion  vectors  computed  using  this  ap¬ 
proach  (or  the  pseudo- intersection  approach)  are  not  reliable, 
so  that  it  is  desirable  to  perform  some  (perhaps  iterative) 
spatial  smoothing  of  the  vectors.  An  important  property  of  such 
smoothing  schemes  is  that  they  should  not  smooth  across  the  boun 
dary  of  a  moving  object.  In  Section  3  of  this  paper  we  present 
such  a  technique.  We  assume  that  the  image  motion  is  locally  a 
rigid  2*jD  motion  (see  also  Schalkoff  and  McVey  [7]),  and  smooth 
the  velocity  vector  at  a  point  based  on  associating  it  with  that 
one  of  its  8-neighbors  whose  neighborhood  is  "most  smooth”  in 
this  sense  (compare  Haralick  [8]).  For  interior  points  near  the 
boundary  of  a  moving  object,  we  could  expect  such  an  approach  to 
choose  a  neighborhood  which  is  completely  contained  within  the 
object. 


2.  Multiple  constraint  equations  of  a  single  point 

The  differential  methods  discussed  in  Section  1  are  based 
on  the  assumption  that  the  image  intensity  corresponding  to 
any  scene  point  is  invariant  to  motion;  i.e.,  if  I(x,y,t)  is 
the  intensity  at  position  (x,y)  and  time  t,  and  if  the  image 
motion  is  (u,v),  then  I (x,y,t)=l (x+u,y+v,t+l) .  Since  the  in¬ 
tensity  is  invariant  to  motion,  so  are  various  derivatives  of 
intensity.  In  particular,  the  gradient  of  intensity  would  be 
invariant  to  motion,  so  that  we  can  write  a  second  linear  con¬ 
straint  on  the  image  motion: 

Gxu  +  Gyv  *  “Gt  (2) 

where  G  and  G  are  the  spatial  derivatives  of  the  image  gradi- 
x  y 

ent  (i.e.,  directional  second  derivatives)  and  Gt  is  the  tem¬ 
poral  derivative  of  the  gradient. 

In  general,  if  F  denotes  any  motion- invariant  feature,  we 
can  produce  the  constraint  equation 

Fxu  +  Fyv  =  -F  t  (3) 

For  example,  F  may  be  the  gradient  direction,  the  curvature  of 
the  surface,  the  moments  of  local  intensity  distributions,  or 
higher-order  derivatives  of  intensity.  However,  in  practice, 
features  that  are  defined  in  terms  of  higher-order  spatial 
derivatives  are  not  reliable  since  the  differentiation  operation 
tends  to  amplify  noise. 


Notice  that  we  can  also  construct  sets  of  constraint  equa¬ 
tions  from  color  images,  since  at  least  one  constraint  equation 


may  be  obtained  from  each  band.  If  more  than  two  constraint 
equations  are  available  at  any  point,  then  a  least-square  solu 
tion  to  the  pseudo- intersection  of  these  constraints  can  be 
obtained  using  pseudo- inverse  techniques. 
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3.  Iterative  algorithm  for  motion  enhancement 

The  velocity  vectors  computed  either  by  spatially  integrat¬ 
ing  constraints  from  local  neighborhoods  of  points  or  combining 
multiple  constraint  equations  from  a  single  point  are  often  inac 
curate.  In  this  section  we  introduce  an  iterative  algorithm  for 
enhancing  such  motion  vectors. 

The  method  for  smoothing  motion  vectors  is  based  on  the  fol¬ 
lowing  two  assumptions: 

1.  Objects  are  rigid,  and 

2.  Objects  are  undergoing  2%D  motion,  i.e.,  they  move 
along  a  plane  perpendicular  to  the  line  of  sight  with 
arbitrary  translational  and  rotational  velocity. 

We  iteratively  update  the  velocity  at  point  P  on  the  image 
as  follows.  First,  choose  the  neighbor  of  P  whose  3*3  neighbor¬ 
hood  of  motion  vectors  best  fits  a  2>sD  rigid  motion.  Let  the 
rotational  velocity  in  that  neighborhood  be  w.  Since  the  rota¬ 
tional  velocity  is  constant  over  a  single  moving  object,  we 
regard  <*>  as  the  rotational  velocity  for  P  and  update  the  velo¬ 
city  at  P.  The  details  are  described  below. 

'  Figure  1  shows  an  object  moving  with  angular  velocity  $5 
and  translational  velocity  v*.  0*  is  the  center  of  rotation. 

For  an  arbitrary  point  P  on  the  object  surface,  the  resultant 


velocity  v  is 

v  *  vT  +  r  x  ft  (4) 

•  K  ^ 

Given  the  image  motion  vectors  at  P  and  P',  we  compute  w  (the 

4 

rotational  velocity  of  P'  about  P)  as  follows: 


Av  *  v*  -  v  (5) 

Ar  =  $'  -  R  =  r'  -  r  (6) 

The  rotational  velocity  u>'  at  P'  with  respect  to  P  is  then 


u' 


Ar  x  Ar 

II  Aril 2 


(7) 


The  rotational  velocity  at  P  is  obtained  by  averaging  u'  for 
all  points  in  a  neighborhood,  N,  of  P  of  size  n: 

(8) 


*• 

U) 


2  u>Vn 


i€N 


To  compute  motion  along  the  line  of  sight,  we  consider  the  quan¬ 
tity 

D  =  2  (9) 


i€N  II  Aril 

D  is  a  measure  of  dilation,  which  reflects  motion  along  the  line 
of  sight.  For  2*jD  object  motion  D  should  be  constant  for  all 
points  in  a  moving  object.  (D>0  if  the  relative  depth  is  de¬ 
creasing.  ) 

Next  we  consider  the  following  two  error  measurements  at  an 
arbitrary  point  i: 

2  II 2,  -  wj|2  (10) 
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To  update  the  velocity  at  point  P,  we  choose  a  point  P'  from  P's 

neighborhood  such  that  the  linear  combination  Epl  of  the  above 

two  errors  is  a  minimum  at  P's 

Eol  »  minlEa  +  a*Ea] 
p  j«N  3  3 


(12) 


where  a  is  a  scalar.  The  velocity  v  at  P  is  then  adjusted  by 
assuming  that  P  has  the  rotational  velocity  a>'  with  respect  to 
P*  and  computing 

v  ■  v*  -  Ar  x  u'  (13) 

where  v*  is  the  translational  velocity  at  P’  and  Ar  is  the  vec- 
«  tor  joining  P  and  P*.  An  important  advantage  of  this  approach 

is  that  since  the  error  measurements  along  a  moving  boundary  are 
relatively  large,  the  enhancement  tends  not  to  combine  the  mo¬ 
tion  estimates  across  such  boundaries. 


i 


4.  Experiments 

In  this  section  we  describe  a  set  of  experiments  designed 
to  demonstrate  the  behavior  of  the  motion  estimation  and  smooth¬ 
ing  algorithms  described  in  the  previous  sections.  For  com¬ 
parison,  we  also  implemented  one  other  motion  estimation  algo¬ 
rithm  (the  pseudo- intersection  method  mentioned  in  Section  1)  and 
one  of  the  motion  smoothing  algorithms  (the  one  described  in 
Horn  and  Schunck  [9]  -  although  note  that  in  [9]  the  original 
motion  vector  field  was  just  the  field  of  normal  components). 

The jifput  image  sequences  are  displayed  in  Figures  2-4. 

Figure  2  Jhcfws  a  sequence  that  contains  two  moving  cars.  Figure 
3  and  4  are  synthetic  images.  In  Figure  3,  a  sphere  rotates 
in  the  image  plane,  while  approaching  the  viewer  (a  2-D  rotation 
with  zoom).  In  Figure  4,  the  same  sphere  undergoes  a  3-D  rota¬ 
tion,  while  approaching  the  viewer  and  translating  towards  the 
right.  The  intensity  corresponding  to  any  point  on  the  sphere 
is  invariant  to  the  motion. 

Although,  in  principle,  it  is  possible  to  use  the  multiple 
constraint  method  to  compute  a  velocity  vector  at  each  point,  in 
fact  a  number  of  practical  considerations  limit  the  set  of  points 
at  which  useful  estimates  can  be  obtained  to  points: 

1)  having  non-zero  spatial  and  temporal  intensity  derivatives 

2)  having  non-singular  matrices  corresponding  to  the  multi¬ 
ple  constraint  equations,  and 

3)  for  which  the  estimated  motion  is  small  (i.e.,  vector 


magnitude  less  than  5  pixels). 


Figure  5a  shows  the  motion  estimates  for  the  multi-con¬ 
straint  method  for  frame  6  of  the  moving  car  sequence.  All 
spatial  derivatives  are  based  on  fitting  a  quadratic  surface 
to  a  5»<5  neighborhood,  and  all  temporal  gradients  are  based 
on  a  quadratic  approximation  to  5  consecutive  temporal 
points.  For  comparison,  Figure  5b  shows  the  motion  vectors 
computed  by  the  pseudo- intersection  method.  They  are  much 
better  approximations  to  the  actual  motion,  although  as  we 
shall  see,  the  motion  enhancement  algorithm  plays  a  larger 
role  than  the  initial  estimates  in  determining  the  accuracy 
of  the  final  motion  vector  field. 

Next,  we  consider  three  combinations  of  initial  motion 
estimation  and  motion  enhancement: 

A)  Multi-constraint  initial  estimation  with  Horn  and 
Schunck  motion  enhancement. 

B)  Pseudo- intersection  initial  estimation  with  Horn  and 
Schunck  motion  enhancement. 

C)  Pseudo- inter sect ion  initial  estimation  with  the  motion 
enhancement  algorithm  of  Section  3. 

By  comparing  A  and  B  we  can  evaluate  the  role  of  the  initial 
estimates  in  the  overall  motion  computation,  while  the  compari¬ 
son  of  B  with  C  can  demonstrate  whether  the  computationally 
more  costly  algorithm  of  Section  3  (which  is  designed  to  avoid 
smoothing  over  motion  boundaries)  has  its  higher  cost  justified 
by  better  motion  estimates. 
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In  order  to  evaluate  the  results  quantitatively,  we  con 


sider  two  error  measures.  The  first  measure  was  designed 
to  enable  us  to  evaluate  motion  estimates  on  sequences  for 
which  the  actual  motion  is  unknown  and  is  based  on  measur¬ 
ing  how  well  the  motion  vectors  predict  intensity  from  one 
frame  to  the  next.  This  measure,  E  ,  is  defined  as 


where : 

a)  I^T  is  the  intensity  of  pixel  i  at  time  t, 

b)  L  is  the  set  of  pixels  at  which  the  estimated  motion  vector 
is  non-zero,  and 

c)  n  is  the  size  of  L. 


The  second  measure,  Ey,  compares  the  estimated  motion  vectors 
with  true  motion  vectors,  and  is  defined  as 


Ev(t) 


where 

a)  vj  .  is  the  true  motion  vector  at  point  i  at  time  t, 

b)  v.  .  is  the  estimated  motion  vector  at  point  i  at  time  t, 

i ,  r 

c)  K  is  the  set  of  points  having  non-zero  true  motion,  and 

d)  m  is  the  size  of  the  set  K. 

Figure  6a-d  show  Ep  for  four  frames  of  the  car  sequence  in 
Figure  2.  We  can  make  the  following  observations  about  these 
graphs  (the  observations  also  hold  for  other  frames  in  this 
sequence ) : 


1)  The  final  motion  vectors  are  not  particularly  sensitive 
to  the  choice  of  ini tied  estimate,  but  appear  to  depend 
more  critically  on  the  enhancement  algorithm.  (Compare 
A-B. ) 

2)  Most  of  the  enhancement  takes  place  during  the  first 
two  iterations  of  either  enhancement  algorithm. 

3)  After  3  iterations  the  differences  between  all  three  ap¬ 
proaches  are  insignificant,  so  that  we  would  choose  among 
them  based  on  computational  cost  (which  would  lead  to  a 
choice  of  B  -  pseudo- intersection  with  Horn  and  Schunck.) 
Since  the  multi-constraint  method  offers  no  practical 
advantages  over  pseudo- intersection  method  for  initial 
motion  estimation,  we  will  not  consider  method  A  in  the 
remaining  examples. 

Consider  next  the  spheres  in  Figures  3  and  4.  Figures  7  and 
8  show  Ev  for  one  frame  from  each  of  these  sequences. 

Here,  curve  A  corresponds  to  the  non-linear  enhancement  algo 
rithm  in  Section  3  and  curve  B  corresponds  to  the  Horn- Schunck 
algorithm.  Furthermore,  we  have  decomposed  the  total  error  into 
two  components  -  one  corresponding  to  a  region  near  the  border 
of  the  sphere  (A-l  and  B-l),  and  the  second  corresponding  to  the 
interior  of  the  sphere  (A-2,  B-2).  We  observe  that  for  both  the 
2-D  and  the  3-D  motion,  the  error  component  due  to  the  boundary 
is  less  for  the  non-linear  enhancement  algorithm  than  for  the 
Horn- Schunck  algorithm.  Thus,  near  the  boundary,  at  least,  the 
search  for  a  "best"  neighborhood  to  compute  the  enhancement  does 
lead  to  more  accurate  motion  estimates.  Considering  the 


component  of  error  on  the  interior  of  the  sphere,  we  note  that 
both  the  non-linear  algorithm  and  Horn-Schunck  produce  very 
accurate  motion  estimates  for  the  2^D  motion  (errors  of  ~.l). 

This  is  not  surprising,  since  the  non-linear  algorithm  is  expli¬ 
citly  based  on  a  2**D  motion  assumption,  while  as  regards  Horn- 
Schunck,  the  Laplacian  of  a  2%D  motion  vector  field  is  zero. 

The  two  algorithms  produce  similar,  but  higher,  errors  in  the 
3-D  motion  case,  the  slightly  better  performance  of  the  nonlinear 
algorithm  perhaps  attributable  to  the  search  for  a  best  neigh¬ 


borhood 


5.  Conclusions 


Based  on  the  experiments  presented  in  Section  4  we  can 
draw  the  following  conclusions: 

1.  The  multi-constraint  algorithm  does  not  produce 
reliable  motion  estimates. 

2.  The  enhancement  algorithm  plays  a  much  larger  role 
than  the  initial  estimates  in  determining  the  utility 
of  the  final  motion  estimates. 

3.  Enhancing  the  motion  estimate  of  a  pixel  based  on  first 
searching  for  the  "best"  neighborhood  containing  that 
pixel  (i.e.,  the  neighborhood  whose  motion  estimates 
best  satisfy  the  given  motion  model)  yields  much  more 
accurate  motion  estimates  near  the  borders  of  moving 
regions . 
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A  rigid  object  under  general  3D  motion 
0*  is  the  instantaneous  center  of  rota 
tion  at  which  the  velocity  is  V 


Fig.  2.  Moving  cars 


.  6a.  A  -  Multi-constraint  with  Horn-Schunck . 

B  -  Pseudo-intersection  with  HS. 

C  -  PI  with  facet-like  enhancement  of  Section  3. 


Fig.  6b 
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